Skip to content

fix joins for missing key columns#187

Merged
mchav merged 9 commits intoDataHaskell:mainfrom
Anamika1608:fix/missing-join-columns
Mar 25, 2026
Merged

fix joins for missing key columns#187
mchav merged 9 commits intoDataHaskell:mainfrom
Anamika1608:fix/missing-join-columns

Conversation

@Anamika1608
Copy link
Contributor

closes #118

summary

this fixes the join behavior when a requested join key does not exist. instead of silently hashing an empty key set and producing cartesian style output, joins now fail fast with the existing columnnotfoundexception.

what changed

  • added a shared validation helper in join.hs that checks requested join keys against dataframe column indices before hashes are computed
  • routed innerjoin, leftjoin, rightjoin, fullouterjoin, and buildhashcolumn through that validation path
  • kept the public join signatures unchanged
  • preserved the correct rightjoin callpoint by routing it through an internal leftjoin helper with an explicit callpoint
  • added regression tests for missing join keys across inner, left, right, and full outer joins
  • updated subset.hs to use randomgen and split so the test suite builds under the current nix toolchain

behavior

before this change, a missing join column could fall through to hashing zero key columns, which made every row hash the same and could expand into cartesian style output. after this change, the join fails immediately with the missing column name and the existing available column context.

verification

  • source /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh
  • nix --extra-experimental-features 'nix-command flakes' develop --no-write-lock-file -c cabal test test:tests --test-show-details=direct --test-options='-p Operations.Join'\n- nix --extra-experimental-features 'nix-command flakes' develop --no-write-lock-file -c cabal test test:tests --test-show-details=direct\n\n## results\n- full tests suite passed\n- 445 hunit cases passed\n- 0 failures\n- 0 errors\n- all quickcheck properties passed

@Anamika1608
Copy link
Contributor Author

@daikonradish @mchav, can you review this pr?

@daikonradish
Copy link
Contributor

We've left some comments, but in general your code looks good. Some of it is very stylish! Thank you for your effort.

@Anamika1608
Copy link
Contributor Author

@mchav @daikonradish fixed all the issues. check once?

@Anamika1608
Copy link
Contributor Author

@mchav check now?

@Anamika1608 Anamika1608 force-pushed the fix/missing-join-columns branch from be407f8 to 8d3d381 Compare March 25, 2026 06:58
@Anamika1608 Anamika1608 force-pushed the fix/missing-join-columns branch from 8d3d381 to b740b06 Compare March 25, 2026 07:00
@Anamika1608
Copy link
Contributor Author

@mchav check now?

@mchav mchav merged commit 4240449 into DataHaskell:main Mar 25, 2026
14 of 22 checks passed
@mchav
Copy link
Member

mchav commented Mar 25, 2026

Thank you @Anamika1608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] joins with incorrect names should error out, not OUTER JOIN.

3 participants